2 research outputs found
On Evaluation of Bangla Word Analogies
This paper presents a high-quality dataset for evaluating the quality of
Bangla word embeddings, which is a fundamental task in the field of Natural
Language Processing (NLP). Despite being the 7th most-spoken language in the
world, Bangla is a low-resource language and popular NLP models fail to perform
well. Developing a reliable evaluation test set for Bangla word embeddings are
crucial for benchmarking and guiding future research. We provide a
Mikolov-style word analogy evaluation set specifically for Bangla, with a
sample size of 16678, as well as a translated and curated version of the
Mikolov dataset, which contains 10594 samples for cross-lingual research. Our
experiments with different state-of-the-art embedding models reveal that Bangla
has its own unique characteristics, and current embeddings for Bangla still
struggle to achieve high accuracy on both datasets. We suggest that future
research should focus on training models with larger datasets and considering
the unique morphological characteristics of Bangla. This study represents the
first step towards building a reliable NLP system for the Bangla language1
Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices
BERT-based neural architectures have established themselves as popular
state-of-the-art baselines for many downstream NLP tasks. However, these
architectures are data-hungry and consume a lot of memory and energy, often
hindering their deployment in many real-time, resource-constrained
applications. Existing lighter versions of BERT (eg. DistilBERT and TinyBERT)
often cannot perform well on complex NLP tasks. More importantly, from a
designer's perspective, it is unclear what is the "right" BERT-based
architecture to use for a given NLP task that can strike the optimal trade-off
between the resources available and the minimum accuracy desired by the end
user. System engineers have to spend a lot of time conducting trial-and-error
experiments to find a suitable answer to this question. This paper presents an
exploratory study of BERT-based models under different resource constraints and
accuracy budgets to derive empirical observations about this resource/accuracy
trade-offs. Our findings can help designers to make informed choices among
alternative BERT-based architectures for embedded systems, thus saving
significant development time and effort